503 research outputs found

    Optimal affine image normalization approach for optical character recognition

    Get PDF
    Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. a geometric transformation resulting in an image as if it was captured at an angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. Usually, the camera optical axis is approximately perpendicular to the document surface, so the projective normalization can be replaced with an affine one without a significant loss of accuracy. An affine image transformation is performed significantly faster than a projective normalization, which is important for OCR on mobile devices. In this work, we propose a fast approach for image normalization. It utilizes an affine normalization instead of a projective one if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to a problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of the affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization procedure to be further accelerated.This work was partially financially supported by the Russian Foundation for Basic Research, projects 18-29-26035 and 17-29-03370

    U-Net-bin: hacking the document image binarization contest

    Get PDF
    Image binarization is still a challenging task in a variety of applications. In particular, Document Image Binarization Contest (DIBCO) is organized regularly to track the state-of-the-art techniques for the historical document binarization. In this work we present a binarization method that was ranked first in the DIBCO`17 contest. It is a convolutional neural network (CNN) based method which uses U-Net architecture, originally designed for biomedical image segmentation. We describe our approach to training data preparation and contest ground truth examination and provide multiple insights on its construction (so called hacking). It led to more accurate historical document binarization problem statement with respect to the challenges one could face in the open access datasets. A docker container with the final network along with all the supplementary data we used in the training process has been published on Github.The work was partially funded by Russian Foundation for Basic Research (projects 17-29-07092 and 17-29-07093)

    Screen recapture detection based on color-texture analysis of document boundary regions

    Get PDF
    This paper examines a presentation attack detection when a document recaptured from a screen is presented instead of the original document. We propose an algorithm based on analyzing a moirΓ© pattern within document boundary regions as a distinctive feature of the recaptured image. It is assumed that the pattern overlapping the document boundaries is a recapture artifact, not a match between document and background textures. To detect such a pattern, we propose an algorithm that employs the result of the fast Hough transform of the document boundary regions with enhanced pattern contrast. The algorithm performance was measured for the open dataset DLC-2021, which contains images of mock documents as originals and their screen recaptures. The precision of the proposed solution was evaluated as 95.4 %, and the recall as 90.5 %.This work was partially supported by the Russian Foundation for Basic Research (Project No. 18-29-26035)

    Two calibration models for compensation of the individual elements properties of self-emitting displays

    Get PDF
    In this paper, we examine the applicability limits of different methods of compensation of the individual properties of self-emitting displays with significant non-uniformity of chromaticity and maximum brightness. The aim of the compensation is to minimize the perceived image non-uniformity. Compensation of the displayed image non-uniformity is based on minimizing the perceived distance between the target (ideally displayed) and the simulated image displayed by the calibrated screen. The S-CIELAB model of the human visual system properties is used to estimate the perceived distance between two images. In this work, we compare the efficiency of the channel-wise and linear (with channel mixing) compensation models depending on the models of variation in the characteristics of display elements (subpixels). It was found that even for a display with uniform chromatic subpixels characteristics, the linear model with channel mixing is superior in terms of compensation accuracy.This work was supported by Russian Science Foundation (Project No. 20-61-47089)

    Towards a unified framework for identity documents analysis and recognition

    Get PDF
    Identity documents recognition is far beyond classical optical character recognition problems. Automated ID document recognition systems are tasked not only with the extraction of editable and transferable data but with performing identity validation and preventing fraud, with an increasingly high cost of error. A significant amount of research is directed to the creation of ID analysis systems with a specific focus for a subset of document types, or a particular mode of image acquisition, however, one of the challenges of the modern world is an increasing demand for identity document recognition from a wide variety of image sources, such as scans, photos, or video frames, as well as in a variety of virtually uncontrolled capturing conditions. In this paper, we describe the scope and context of identity document analysis and recognition problem and its challenges; analyze the existing works on implementing ID document recognition systems; and set a task to construct a unified framework for identity document recognition, which would be applicable for different types of image sources and capturing conditions, as well as scalable enough to support large number of identity document types. The aim of the presented framework is to serve as a basis for developing new methods and algorithms for ID document recognition, as well as for far more heavy challenges of identity document forensics, fully automated personal authentication and fraud prevention.This work was partially supported by the Russian Foundation for Basic Research (Project No. 18-29-03085 and 19-29-09055)

    Advanced Hough-based method for on-device document localization

    Get PDF
    The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powerful CPUs, or a large RAM capacity on consumer-grade end devices such as smartphones, the time limitations put significant constraints on the computational complexity of the applied algorithms for on-device execution. In this work, we consider document location in an image without prior knowledge of the docu-ment content or its internal structure. In accordance with the published works, at least 5 systems offer solutions for on-device document location. All these systems use a location method which can be considered Hough-based. The precision of such systems seems to be lower than that of the state-of-the-art solutions which were not designed to account for the limited computational resources. We propose an advanced Hough-based method. In contrast with other approaches, it accounts for the geometric invariants of the central projection model and combines both edge and color features for document boundary detection. The proposed method allowed for the second best result for SmartDoc dataset in terms of precision, surpassed by U-net like neural network. When evaluated on a more challenging MIDV-500 dataset, the proposed algorithm guaranteed the best precision compared to published methods. Our method retained the applicability to on-device computations.This work is partially supported by Russian Foundation for Basic Research (projects 18-29-26035 and 19-29-09092)

    X-ray tomography: the way from layer-by-layer radiography to computed tomography

    Get PDF
    The methods of X-ray computed tomography allow us to study the internal morphological structure of objects in a non-destructive way. The evolution of these methods is similar in many respects to the evolution of photography, where complex optics were replaced by mobile phone cameras, and the computers built into the phone took over the functions of high-quality image generation. X-ray tomography originated as a method of hardware non-invasive imaging of a certain internal cross-section of the human body. Today, thanks to the advanced reconstruction algorithms, a method makes it possible to reconstruct a digital 3D image of an object with a submicron resolution. In this article, we will analyze the tasks that the software part of the tomographic complex has to solve in addition to managing the process of data collection. The issues that are still considered open are also discussed. The relationship between the spatial resolution of the method, sensitivity and the radiation load is reviewed. An innovative approach to the organization of tomographic imaging, called β€œreconstruction with monitoring”, is described. This approach makes it possible to reduce the radiation load on the object by at least 2 – 3 times. In this work, we show that when X-ray computed tomography moves towards increasing the spatial resolution and reducing the radiation load, the software part of the method becomes increasingly important.This work was supported by Russian Foundation for Basic Research (Projects No.18-29-26033, 18-29-26020)

    Neural network regularization in the problem of few-view computed tomography

    Get PDF
    The computed tomography allows to reconstruct the inner morphological structure of an object without physical destructing. The accuracy of digital image reconstruction directly depends on the measurement conditions of tomographic projections, in particular, on the number of recorded projections. In medicine, to reduce the dose of the patient load there try to reduce the number of measured projections. However, in a few-view computed tomography, when we have a small number of projections, using standard reconstruction algorithms leads to the reconstructed images degradation. The main feature of our approach for few-view tomography is that algebraic reconstruction is being finalized by a neural network with keeping measured projection data because the additive result is in zero space of the forward projection operator. The final reconstruction presents the sum of the additive calculated with the neural network and the algebraic reconstruction. First is an element of zero space of the forward projection operator. The second is an element of orthogonal addition to the zero space. Last is the result of applying the algebraic reconstruction method to a few-angle sinogram. The dependency model between elements of zero space of forward projection operator and algebraic reconstruction is built with neural networks. It demonstrated that realization of the suggested approach allows achieving better reconstruction accuracy and better computation time than state-of-the-art approaches on test data from the Low Dose CT Challenge dataset without increasing reprojection error.This work was partly supported by RFBR (grants) 18-29-26020 and 19-01-00790

    Document image analysis and recognition: a survey

    Get PDF
    This paper analyzes the problems of document image recognition and the existing solutions. Document recognition algorithms have been studied for quite a long time, but despite this, currently, the topic is relevant and research continues, as evidenced by a large number of associated publications and reviews. However, most of these works and reviews are devoted to individual recognition tasks. In this review, the entire set of methods, approaches, and algorithms necessary for document recognition is considered. A preliminary systematization allowed us to distinguish groups of methods for extracting information from documents of different types: single-page and multi-page, with text and handwritten contents, with a fixed template and flexible structure, and digitalized via different ways: scanning, photographing, video recording. Here, we consider methods of document recognition and analysis applied to a wide range of tasks: identification and verification of identity, due diligence, machine learning algorithms, questionnaires, and audits. The groups of methods necessary for the recognition of a single page image are examined: the classical computer vision algorithms, i.e., keypoints, local feature descriptors, Fast Hough Transforms, image binarization, and modern neural network models for document boundary detection, document classification, document structure analysis, i.e., text blocks and tables localization, extraction and recognition of the details, post-processing of recognition results. The review provides a description of publicly available experimental data packages for training and testing recognition algorithms. Methods for optimizing the performance of document image analysis and recognition methods are described.The reported study was funded by RFBR, project number 20-17-50177. The authors thank Sc. D. Vladimir L. Arlazarov (FRC CSC RAS), Pavel Bezmaternykh (FRC CSC RAS), Elena Limonova (FRC CSC RAS), Ph. D. Dmitry Polevoy (FRC CSC RAS), Daniil Tropin (LLC β€œSmart Engines Service”), Yuliya Chernysheva (LLC β€œSmart Engines Service”), Yuliya Shemyakina (LLC β€œSmart Engines Service”) for valuable comments and suggestions

    T and F asymmetries in Ο€0 photoproduction on the proton

    Get PDF
    The γp→π0p reaction was studied at laboratory photon energies from 425 to 1445 MeV with a transversely polarized target and a longitudinally polarized beam. The beam-target asymmetry F was measured for the first time and new high precision data for the target asymmetry T were obtained. The experiment was performed at the photon tagging facility of the Mainz Microtron (MAMI) using the Crystal Ball and TAPS photon spectrometers. The polarized cross sections were expanded in terms of associated Legendre functions and compared to recent predictions from several partial-wave analyses. The impact of the new data on our understanding of the underlying partial-wave amplitudes and baryon resonance contributions is discussed
    • …
    corecore